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ABSTRACT 

This  paper  discusses  explanatory  schema  acquisition  a  learning  technique  with 
several  interesting  properties.  It  does  not  require  a  teacher  or  concept  matching 
predicate  to  be  provided.  It  does  not  rely  on  searching  a  concept  space  to  produce 
generalizations.  It  can  acquire  a  new  concept  based  on  only  one  input  example, 
although  later  inputs  might  result  in  refinement  of  learned  concepts.  These  features 
are  made  possible  by  taking  a  very  knowledge-based  approach.  The  concepts  that  are 
learned  are  problem-solving  schemata.  Thus,  the  technique  is  not  applicable  to  all 
types  of  learning.  However*  it  provides  a  unique  perspective  on  a  large  and 
interesting  class  of  learning t 


Subfield:  Learning  and  Knowledge  Acquisition 
Category:  Short  Paper 
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1.  Introduction 

This  paper  gives  an  overview  of  a  learning  technique  being  developed  at  the 
University  of  Illinois.  The  technique,  called  explanatory  schema  acquisition  has 
some  interesting  properties.  For  example,  it  does  not  require  a  teacher  or  other 
oracle  to  select  important  examples;  it  is  capable  of  one  trial  learning;  and  con¬ 
trary  to  Mitchell's  recent  taxonomy  of  learning  systems  [14],  it  does  little  or  no 
searching  in  the  process  of  acquiring  a  new  concept. 

Before  describing  the  technique  we  will  pause  briefly  to  consider  what  we  might 
call  the  "standard  theory  of  concept  formation."  This  approach  underlies  much  of  the 
concept  learning  work  in  psychology  and  AI.  In  the  standard  theory,  a  system  is 
given  a  number  of  inputs.  Each  input  is  composed  of  a  set  of  features.  A  concept  is 
a  conjunctive  and/or  disjunctive  combination  of  features.  An  input  with  the  proper 
combination  of  features  is  an  instance  of  the  concept;  otherwise  it  is  not  an 
Instance.  A  teacher,  usually  a  human,  supplies  sample  inputs  to  the  system  together 
with  the  information  of  which  concept  (if  any)  the  input  is  a  concept  of.  The 
system's  task  is  to  discover  the  combination  of  features  that  compose  each  concept. 
This  approach  has  been  fruitfully  applied  to  many  diverse  domains  (for  example,  [12], 
[13],  [21])  and  is  a  cornerstone  of  the  field  of  inductive  inference. 
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Now  we  can  ask  how  we  might  construct  an  untutored  concept  learning  system.  At 
first  this  seems  a  bit  of  a  non  secuitur.  Removing  the  teacher  appears  to  result  in 
no  learning  at  all.  Also  the  notion  forming  a  concept  from  a  single  input  seems 
suspicious. 

The  key  is  to  adopt  a  much  more  knowledge-based  approach.  The  learning  algo¬ 
rithm  to  be  described  requires  access  to  a  large  amount  of  domain  knowledge.  It  is 
through  reconciling  a  new  input  to  the  domain  knowledge  that  learning  and  generaliza¬ 
tion  occurs. 

This  is  NOT  to  say  that  the  proposed  learning  technique  is  domain  specific. 
Techniques  specific  to  a  particular  domain  would  be  ad.  hoc  and  of  very  limited 
theoretical  interest.  Rather*  explanatory  schema  acquisition  is  domain  Independent. 
Indeed,  it  has  already  been  applied  to  three  very  different  domains.  The  approach 
does,  however,  require  access  to  a  rich  domain  model.  It  is  interaction  with  this 
rich  domain  information  that  determines  whether  or  not  concept  acquisition  is  possi¬ 
ble  or  desirable  for  a  new  input.  The  interaction  also  guides  the  generalization 
process. 

2-  Explanatory  Schema.  AcflUlsltton 

The  process  involves  three  logically  distinct  (but  possibly  temporally  con¬ 
current)  processes: 

1)  The  new  input  is  understood. 

2)  The  input  is  evaluated  to  see  if  schema  formation  i3  warranted. 

3)  The  input  is  generalized  to  a  new  schema. 

For  illustrative  purposes  we  will  assume  that  the  input  is  the  following  brief 
natural  language  story.  The  assumption  of  natural  language  input  is  not  necessary 
and,  indeed,  one  of  the  current  applications  involves  robot  arm  planning  which,  of 
course,  is  non-linguistic. 

John,  a  bank  teller,  discovered  that  his  boss,  Fred,  had  embezzled  $100,000. 

John  sent  Fred  an  inter-office  memo  saying  that  he  would  inform  the  police 

unless  he  was  given  $15,000.  Fred  paid  John  the  money. 


2*1.  llndsratandln&  ihe.  Input 

The  requirements  on  the  understanding  process  are  not  controversial.  By  "under¬ 
standing"  we  mean  nothing  more  than  constructing  a  causally  complete  representation 
of  the  input.  This  requires  that  any  crucial  information  missing  from  the  story  must 
be  Inferred  and  that  the  causal  relations  between  components  be  discovered  and  made 
explicit.  While  this  is  not  an  easy  task,  it  is  one  which  has  been  the  focus  of  a 
good  deal  of  research,  particularly  for  natural  language  texts  ([1],  [3],  [10],  [17], 
[19]). 

We  require  that  our  representation  have  one  component  that  is  not  generally 
included  by  understanding  systems.  We  require  that  the  understander  maintain  data 
dependency  links  ([4],  [6])  justifying  each  element  in  the  representation.  The  links 
connect  each  representation  event  with  all  of  the  inference  rules  from  the  domain 
model  that  were  used  to  justify  the  event  during  the  understanding  process.  This 
Includes  all  causal  information,  goal  enablements,  planning  information,  etc.  This 
makes  explicit  in  the  final  story  representation  the  reasons  the  system  had  for  con¬ 
necting  events  in  a  particular  way.  For  example,  in  the  blackmail  story  John's 
demanding  that  Fred  give  him  $15,000  is  explicitly  mentioned.  The  system  must  infer 


that  John  has  the  goal  of  possessing  the  $15,000.  This  is  a  necessary  inference.  A 
system  cannot  be  said  to  have  "understood"  the  input  (in  any  sense  of  the  word)  if  it 
does  not  make  this  inference. 

We  require  not  only  that  the  inference  be  made  but  that  it  be  Justified  by 
Including  data  dependency  links  to  the  appropriate  inference  rules.  In  this  example, 
the  relevant  inference  rules  state  that  all  volitional  actions  are  done  in  service  of 
goals  and  that  any  kind  of  request  is  probably  done  in  service  of  the  goal  of  pos¬ 
sessing  the  requested  object. 

By  and  large,  current  understanding  systems  do  not  include  these  backpointers  to 
inference  rules  in  the  final  representation.  References  to  the  inference  rules 
exist,  rather,  only  in  a  trace  of  the  understander's  processing.  In  most  current 
systems  they  are  available  but  simply  not  included.  We  will  insist  that  they  also  be 
explicitly  stated  in  the  understood  representation.  We  call  the  amalgam  of  all  of 
these  data  dependency  links  the  Inference  Justification  Network. 

la  Generalize  fie  Hat  la  Generalize 

There  are  five  aspects  to  be  considered  when  deciding  whether  or  not  to  general¬ 
ize  an  input  into  a  new  schema.  By  hypothesis  we  will  assume  that  the  input  did  not 
match  an  existing  schema  (if  it  had  then  the  system  already  possesses  the  desired 
schema  and  indeed  that  schema  would  have  been  used  to  process  the  story).  If  any  of 
these  five  conditions  does  not  hold,  constructing  a  new  schema  from  this  input  is 
inappropriate. 

The  criteria  are: 

1)  Is  the  main  goal  of  a  character  achieved? 

2)  Is  the  goal  a  general  one? 

3)  Are  the  resources  required  by  the  goal  achiever  generally 
available? 

4)  Is  this  new  method  of  achieving  the  goal  at  least  as 
effective  as  the  other  known  volitional  schemata  to  achieve 
this  goal? 

5)  Does  the  input  match  one  of  the  known  generalizedable 
patterns? 

These  criteria  are  tested  for  all  goals  in  the  story.  The  first  criterion  "Was 
the  goal  achieved?"  is  self  explanatory  and  easily  Judged.  The  second  "Is  it  a  gen¬ 
eral  goal?"  and  the  third  "Are  the  resources  generally  available?"  require  some  dis¬ 
cussion. 

Novelty  alone  in  an  approach  to  achieving  a  goal  is  not  sufficient  to  warrant 
constructing  a  new  schema.  Consider,  for  example,  a  plot  from  the  "Mission  Impossi¬ 
ble"  television  series.  These  plots  are  noteworthy  in  that  they  are  very  novel.  They 
all  use  bizarre  methods  to  achieve  rather  peculiar  goals.  Furthermore,  they  are 
always  successful.  However,  the  goals  achieved  are  not  the  type  that  arise  in  ordi¬ 
nary  life  and  the  resources  and  skills  needed  are  so  specialized  and  uncommon  that 
the  same  solution  would  never  be  applicable  again. 

Clearly  a  new  schema  should  be  constructed  only  if  there  is  a  reasonable  expec¬ 
tation  that  it  will  be  helpful  in  future  processing.  If  a  schema  will  never  be  used 
again,  it  should  not  be  constructed  in  the  first  place. 

How  can  the  utility  of  a  particular  goal  be  Judged?  The  answer  to  this  is 
closely  tied  to  where  goals  come  from.  Achieving  a  goal  which  arises  from  general 


conditions  important  to  an  individual's  well-being  and  using  readily  available 
resources  is  likely  to  result  in  an  interesting  new  schema,  one  which  will  arise 
again  and  again.  For  the  solution,  we  use  an  aspect  of  Schank  &  Ableson's  theory  of 
planning  [16].  In  their  view  themes  give  rise  to  the  highest  level  goals  (goals 
which  are  not  simply  subgoals  in  the  achievement  of  other  goals).  Interpersonal  and 
Life  themes  are  what  we  are  interested  in.  An  example  of  the  former  is  a  husband 
offering  (and  therefore,  at  some  level,  wanting)  to  type  a  term  paper  for  his  over¬ 
worked  student  wife.  Examples  of  the  latter  are  attempting  to  satisfy  one's  hunger, 
to  gain  money,  or  to  relieve  boredom.  Life  themes  give  rise  to  goals  that  require  no 
further  justification.  Our  example,  which  demonstrates  a  new  way  to  gain  money, 
relates  directly  to  a  life  theme  and  therefore  satisfies  this  criterion. 

Criterion  4  is  self-explanatory.  The  idea  is  that  the  system  should  not  bother 
constructing  schemata  that  are  much  less  efficient  than  similar  already-known  sche¬ 
mata.  In  a  natural  language  input  this  would  occur  only  if  a  character  were  using  a 
highly  sub-optimal  plan.  This  criterion  is  somewhat  application  dependent.  For  cer¬ 
tain  applications  one  might  wish  to  construct  all  possible  new  schemata,  even  those 
stemming  from  bad  examples. 

The  fifth  criterion  has  been  discussed  elsewhere  [2].  As  this  is  a  short  paper 
describing  on-going  research  it  is  not  appropriate  to  repeat  it  here.  Suffice  it  to 
say  that  there  is  a  taxonomy  of  explanatory  acquisition  techniques:  schema  composi¬ 
tion,  secondary  effect  elevation,  volitionallzation,  and  schema  alteration.  The 
technique  that  is  matched  has  implications  for  exactly  how  the  generalization  is  per¬ 
formed  . 

Z-l-  lbs.  Generalization  Process 

Assuming  the  input  is  completely  understood  (with  data  dependency  links  to 
Inference  rules  justifying  the  understanding)  and  the  five  tests  for  learning  have 
all  been  met,  we  must  now  perform  the  actual  generalization.  The  generalization  pro¬ 
cess  consists  in  replacing  the  objects  and  actions  in  the  understood  representation 
with  abstract  counterparts.  These  counterparts  are  the  most  abstract  possible  while 
still  preserving  the  validity  of  the  inference  justification  network. 

Consider  again  the  example  of  John  blackmailing  Fred.  One  proposition  that  is  a 
part  of  the  understood  representation  is  that  Fred  decided  to  pay  John  $15,000.  This 
decision  event  is  not  mentioned  in  the  story;  it  is  one  of  the  events  that  must  be 
inferred  during  understanding  to  build  a  causally  complete  representation.  This 
action  is  Justified  to  the  system  by  a  number  of  other  propositions.  Among  these 
supporting  propositions  are  some  supplied  by  the  schema  DECIDE  (which  we  assume  the 
system  already  possesses).  These  Inferences  from  DECIDE  are: 

1)  The  decider  must  be  at  least  a  higher  animate. 

2)  The  decider  must  be  capable  of  a  number  of  alternative 
possible  actions. 

3)  The  decider  must  know  what  the  alternatives  are. 

4)  The  chosen  alternative  will  be  among  the  most  beneficial/ least 
detrimental  to  the  decider. 

Thus,  these  (and  other)  justifications  are  tied  to  the  representation  of  Fred's 
decision  through  data  dependency  links.  Fred's  decision  is  believable  to  the  system 
because  Fred,  in  fact,  is  a  higher  animate,  he  knows  at  least  two  alternatives  -  pay¬ 
ing  John  or  losing  the  $100,000  and  being  arrested,  and  3)  he  probably  sees  losing 
$15,000  as  less  detrimental  than  losing  $100,000  and  going  to  jail.  These 


justifications  are  supplied  in  the  form  of  pointers  to  the  above  inference  rules  dur¬ 
ing  the  understanding  procedure. 

Now  consider  the  process  of  generalizing  the  amount  of  $15,000.  This  money 
appears  several  places  in  the  representation.  Among  them  is  Fred's  decision  to  give 
the  money  to  John.  The  generalization  process  asks  "What  are  the  constraints  that 
the  $15,000  must  satisfy?"  They  are:  1)  Fred  must  possess  it  or  be  able  to  get  it 
and  2)  Fred  must  value  it  less  than  the  alternative.  Thus,  the  system  realizes  that 
values  other  than  $15,000  would  also  work.  Any  amount  would  work  such  that  1)  Fred 
can  raise  that* amount  and  2)  Fred  would  still  prefer  giving  it  up  to  the  alternative. 
Thus  the  amount  of  $15,000  is  replaced  by  a  schema  variable  with  the  appropriate 
binding  constraints.  Of  course,  Fred  is  also  generalized.  He  is  replaced  by  a 
schema  variable  that  must  be  a  human  who  has  performed  an  illegal  act  that  is  known 
to  a  person  who  is  the  schema  variable  replacing  John.  Thus  the  binding  constraints 
on  the  variable  that  replaces  $15,000  are  specified  in  terms  of  the  variable  that 
replaces  Fred.  Events  are  generalized  as  well.  The  event  of  John  sending  Fred  an 
interoffice  memo  is  generalized  to  any  communication  method.  The  system  realizes  via 
the  inference  justification  network  that  the  Important  effect  of  that  event  is  to 
achieve  a  certain  mental  state  in  Fred.  The  precise  method  is  of  no  importance. 
Some  elements  of  the  input  story  are  totally  discarded.  For  example,  the  fact  that 
Fred  is  John's  boss  is  not  mentioned  in  the  inference  justification  network.  It  is 
therefore  eliminated  the  new  schema. 

Through  these  and  other  generalizations  the  system  can  construct  a  first  version 
of  a  BLACKMAIL  schema.  The  schema  might  not  be  perfect.  There  may  be  later  stories 
that  do  not  quite  fit  and  require  further  modification  of  the  schema.  However,  it  is 
a  reasonably  general  schema  that  is  likely  to  help  a  good  deal  in  processing  future 
similar  stories. 

1.  Conclusion 

There  are  several  concluding  points 

1)  Unlike  many  learning  system  (e.g. ,  [9],  [12],  [21],  [23])  explanatory  schema 
acquisition  does  not  depend  on  correlational  evidence.  It  is  capable  of  one  trial 
learning,  but  the  learning  is  not  based  on  analogical  reasoning  like  [20]  and  [22]. 
It  is  somewhat  similar  to  Soloway's  view  of  learning  [18].  There  is  also  some  resem¬ 
blance  to  the  MACR0PS  notion  in  the  STRIPS  system  [5]. 

2)  The  approach  is  heavily  knowledge-based.  A  great  deal  of  background  knowledge 
must  be  present  for  learning  to  take  place.  In  this  respect  explanatory  schema 
acquisition  follows  the  current  trend  in  AI  learning  and  discovery  systems  perhaps 
traceable  to  Lenat  [11]. 

3)  The  learning  mechanism  is  not  "failure-driven"  as  is  the  MOPs  approach  ([15],  [8], 
[10]).  In  that  view  learning  takes  place  in  response  to  Incorrect  predictions  by  the 
system.  In  explanatory  acquisition  learning  is  usually  stimulated  by  positive  Inputs 
which  encounter  no  particular  problems  or  prediction  failures. 

A)  The  absolute  representation  power  of  the  system  is  not  enhanced  by  learning  new 
sohemata.  This  statement  is  only  superficially  surprising.  Indeed,  Fodor  [7]  shows 
that  this  must  be  true  of  all  self-consistent  learning  systems.  Explanatory  schema 
acquisition  does,  however,  increase  processing  efficiency.  Since  all  real-world  sys¬ 
tems  areresource  limited,  this  learning  technique  does,  in  fact,  Increase  the 
system's  processing  power. 
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